Analyzing the
Impact of
COVID-19 on
Global
Population
Dynamics
Team Members:
Names
Siddhartha Alwala
Vishnu Priya Palagiri
Shiva Sai Kumar Tummala
Bhargavi Kondaveeti
Introduction
Our project focuses on visualizing the impact of COVID-19 on global population
dynamics, aiming to provide insights into how the pandemic has affected
different regions and demographic groups.
Through dynamic and interactive visualizations, we aim to uncover patterns,
trends, and correlations within the data, facilitating informed decision-making
and effective response strategies.
Above Workflow Explanation
Data Collection:
Getting relevant datasets from Kaggle, a popular platform for data science datasets.
Initial Visualization:
Utilizing D3.js to visualize the uncleaned dataset, providing an initial overview of the data's structure and
potential insights.
Data Cleaning:
Using Python programming language and libraries like Pandas and NumPy to clean the dataset, handling
missing values, outliers, and inconsistencies.
Refined Visualization:
Using Python's data visualization libraries such as Matplotlib or Seaborn to create more refined visualizations
based on the cleaned dataset, focusing on specific variables of interest.
Dashboard Creation:
Utilizing Microsoft Power BI to design interactive dashboards and reports, incorporating the refined
visualizations to provide a comprehensive view of the analyzed data.
Report Generation:
Creating detailed reports summarizing the analysis findings, insights, and conclusions drawn from the
visualizations and data analysis process.
Data Abstraction-Dataset (Type and
Attributes)
COVID-19 Dataset:
Type: csv
Attributes:
Date: Date of the record
Country/Region: Name of the country or
region
Confirmed Cases: Number of confirmed
COVID-19 cases
Deaths: Number of deaths due to COVID-
19
Recoveries: Number of recovered COVID-
19 cases, etc..
Life expectancy Dataset:
Type: csv
Attributes:
Country/Region: Name of the country or
region
Population Size: Total population of the
country or region
Life Expectancy: Average life expectancy
in years
Development Status: Development status
of the country or region (e.g., developed,
developing)
GDP (Gross Domestic Product): Economic
output of the country or region
And more etc
Data Abstraction-Detailed Description of
Dataset:
The COVID-19 dataset contains weekly records of COVID-19 cases reported by
country or region. Each record includes information on confirmed cases,
deaths, and recoveries, allowing for the analysis of the pandemic's impact
over time and across different regions.
The global population dataset provides demographic information for each
country or region, including population size, life expectancy, and
development status. This dataset offers insights into the population dynamics
of different regions and their susceptibility to the pandemic.
The economic indicators includes GDP values for each country or region,
enabling the analysis of the economic impact of the pandemic. GDP serves as
a measure of economic output and can help assess the extent of economic
disruption caused by COVID-19.
Data transformation tasks involve:
Cleaning and preprocessing the datasets to ensure consistency and accuracy.
Handling missing values, standardizing column names, and converting data
types as necessary.
Aggregating and merging relevant information from multiple sources for
analysis.
Ensuring data quality through validation checks and addressing any
discrepancies or inconsistencies.
Normalizing or scaling attributes if required for certain analyses.
Performing feature engineering to create additional features if needed for
enhanced analysis.
Task Abstraction
Task:
Target: The primary objective is to analyze the impact of COVID-19 on global population
dynamics, focusing on key metrics such as confirmed cases, deaths, and recoveries.
Actions:
Analyze trends and patterns in COVID-19 cases across different regions and countries.
Explore correlations between COVID-19 metrics and demographic factors such as population
size, life expectancy, and development status.
Investigate the economic implications of the pandemic by examining the relationship
between COVID-19 cases and GDP.
Compare average life expectancy and GDP across WHO regions to identify disparities and
inform targeted interventions.
Above Workflow Explanation
Data Collection:
Gather COVID-19 data from reliable sources like WHO or national health agencies.
Collect demographic data from reputable sources like the World Bank or United Nations.
Data Analysis:
Analyze COVID-19 trends globally and regionally to identify patterns.
Conduct correlation analysis between COVID-19 metrics and demographic factors.
Use statistical methods to explore the relationship between COVID-19 cases and GDP.
Economic Analysis:
Investigate the impact of COVID-19 on GDP by analyzing economic data.
Explore how COVID-19 cases have affected different sectors of the economy.
Consider the long-term economic implications of the pandemic.
Comparison across WHO Regions:
Compare average life expectancy and GDP across different WHO regions.
Identify disparities and variations in population dynamics and economic development.
Use these comparisons to inform targeted interventions and policy decisions.
Reporting and Recommendations:
Interpret the findings from the analysis.
Provide recommendations for policymakers and stakeholders based on the insights gained.
Summarize the key points and implications for future action.
Implementation using Tools:
D3.js:
Description: D3.js (Data-Driven Documents) was utilized for the initial visualization of the uncleaned dataset. It provided a
platform for creating dynamic and interactive charts and graphs directly within web browsers.
Usage: Through D3.js, we generated interactive visualizations such as bar charts to explore the structure and patterns within
the raw dataset.
Python:
Description: Python, along with libraries like Pandas, NumPy, Matplotlib, and Seaborn, played a crucial role in data
preprocessing, analysis, and visualization.
Usage: Pandas and NumPy were employed for data cleaning tasks, including handling missing values, outliers, and
inconsistencies. Matplotlib and Seaborn were used to create refined visualizations based on the cleaned dataset, showcasing
insights through various types of charts and plots.
Microsoft Power BI:
Description: Microsoft Power BI served as a comprehensive platform for designing interactive dashboards and reports,
integrating visualizations to provide a comprehensive view of the analyzed data.
Usage: Power BI enabled us to create interactive visualizations, including bar charts, line graphs, and maps, and seamlessly
integrate them into interactive dashboards. Additionally, Power BI's data modeling capabilities allowed for the creation of
relationships between different datasets, enhancing the depth of analysis in the final reports.
Results for Analysis-
Demonstration
Initial Visualization Using D3.js Pie Chart
of COVID-19 Cases by WHO Region(pre-
cleaning)
Explanation:
The pie chart provides a clear overview of how COVID-19 cases are
distributed among different WHO regions.
We can see at a glance which regions have the highest number of
confirmed cases, enabling policymakers and health organizations to
allocate resources effectively.
Regions with larger segments may require more attention in terms of
healthcare infrastructure and preventive measures.
Story:
Imagine global health officials convening to discuss strategies for combating
the spread of COVID-19. As they gather around a screen displaying the pie
chart, they immediately notice the significant portion representing the
European region, indicating a high number of confirmed cases.
This prompts discussions on implementing stricter containment measures
and increasing healthcare capacity in Europe.
Similarly, the smaller segments representing regions with fewer cases spark
conversations about sharing resources and best practices to prevent further
outbreaks.
Stacked Bar Chart of COVID-19 Cases by
Country-Using D3.js
Explanation:
The stacked bar chart allows for a detailed examination of COVID-19 cases
within individual countries. By breaking down each bar into segments for
confirmed, deaths, and recovered cases, viewers can see not only the
total number of cases but also the outcomes of those cases. This provides
insights into how countries are managing the pandemic, including their
healthcare systems' capacity to treat patients and mitigate fatalities.
Story:
As policymakers analyze the stacked bar chart, they focus on the
disparities in outcomes among the top 10 countries with the highest
confirmed cases. They notice that while some countries have a high
number of confirmed cases, they've also managed to achieve a substantial
number of recoveries, indicating effective healthcare interventions.
Conversely, countries with a high proportion of deaths prompt discussions
on implementing measures to reduce mortality rates, such as improving
access to critical care facilities and vaccine distribution.
After Cleaning Datasets Using Python And
Visualizing Using Matplotlib: Exploring Global
Health Data
Explanation:
Data Cleaning: Removes extra spaces from column names for
consistency.
Summary Statistics: Computes key statistics for numeric
columns.
Visualization: Displays the distribution of life expectancy using
a histogram.
Story:
In a health research project, data cleaning ensures consistency
in column names, aiding smooth analysis.
Summary statistics reveal crucial insights into life expectancy
trends worldwide.
Visualizing life expectancy distributions highlights variations,
guiding targeted interventions for global health improvement.
Understanding Global Population Dynamics:
Distribution Analysis
Explanation:
Data Preparation: Ensures consistency in column names for
clarity.
Statistical Insight: Reveals key population statistics for
analysis.
Visual Representation: Depicts population distribution via
histogram and KDE.
Story:
In a demographic study, data is prepared by standardizing
column names to streamline analysis. Statistical examination
unveils crucial population trends, essential for informed
decision-making. Visualizing population distributions
illuminates demographic disparities, guiding policy
interventions for sustainable development.
Exploring Global Economic Patterns: Distribution of
GDP
Explanation:
Data Preparation: Cleans column names, ensuring
uniformity.
Statistical Analysis: Computes summary statistics for GDP.
Visualization: Displays GDP distribution via a histogram
with KDE.
Story:
In an economic analysis endeavor, data is prepped by
standardizing column names for clarity. Statistical analysis
uncovers insights into global GDP trends, vital for
economic policy formulation. Visualizing GDP distributions
highlights disparities, informing strategies for balanced
economic development.
Analyzing Numeric Relationships: Correlation
Heatmap
Explanation:
Data Filtering: Retains only numeric columns for correlation
analysis.
Exploratory Analysis: Examines correlations between numeric
variables.
Visual Insight: Presents correlations via heatmap with
annotations.
Story:
In a data exploration endeavor, non-numeric columns are
filtered out to focus on quantitative relationships.
Through exploratory analysis, correlations among numeric
variables are investigated, offering insights into
interdependencies.
Visualizing correlations via a heatmap aids in identifying
patterns and guiding further analysis for informed decision-
making.
Analyzing COVID-19 Trends: Summary Statistics and
Time Series Plot
Explanation:
Statistical Overview: Generates summary statistics for COVID-19
data.
Temporal Analysis: Plots trends of confirmed cases, deaths, and
recoveries over time.
Story:
In assessing the COVID-19 pandemic, summary statistics offer
insights into the overall magnitude and variability of cases.
Meanwhile, the time series plot depicts the progression of
confirmed cases, deaths, and recoveries over time, enabling the
observation of trends and fluctuations.
Through these analyses, policymakers and health authorities can
better understand the impact of the pandemic and formulate
effective response strategies.
Visualizing COVID-19 Metrics on a World Map
Explanation:
Data Loading: Imports the world map shapefile and COVID-19 data.
Data Merging: Merges COVID-19 data with world map data based on country names.
Map Visualization: Plots COVID-19 metrics (confirmed cases, deaths, recoveries) on the world
map.
Story:
In mapping the global impact of COVID-19, the world map shapefile is loaded alongside
COVID-19 data.
By merging these datasets, a comprehensive view of the pandemic's spread is achieved,
allowing for spatial analysis.
The resulting visualizations depict the distribution of confirmed cases, deaths, and recoveries
across countries, aiding in understanding regional variations and informing targeted response
efforts.
Exploring the Relationship
Between Life Expectancy and
COVID-19 Cases
Explanation:
Data Merging: Merges datasets based on the 'Country/Region' column.
Scatter Plot: Visualizes the relationship between life expectancy and confirmed
COVID-19 cases.
Story:
By merging datasets on country/region identifiers, a comprehensive dataset is
created for analysis.
Through a scatter plot, the correlation between life expectancy and confirmed
COVID-19 cases is explored, offering insights into potential health disparities and
vulnerabilities.
This analysis contributes to understanding how demographic factors may influence
the spread and impact of the pandemic.
Examining Life Expectancy and
COVID-19 Cases Across
Development Status
Explanation:
Data Visualization: Displays a scatter plot of life expectancy against
confirmed COVID-19 cases.
Color Representation: Categorizes data points by development status,
enhancing visual interpretation.
Story:
In this scatter plot analysis, life expectancy is juxtaposed against
confirmed COVID-19 cases, with data points color-coded according to
development status.
The visualization facilitates the identification of potential correlations
between socioeconomic factors and pandemic outcomes.
By examining disparities across development statuses, policymakers
can tailor interventions to address health inequities and mitigate the
impact of the pandemic on vulnerable populations.
Exploring GDP and COVID-19
Cases Across Development
Status
Explanation:
Visualization: Presents a scatter plot of GDP against confirmed COVID-
19 cases.
Color Coding: Categorizes data points by development status for
enhanced insight.
Story:
This scatter plot delves into the relationship between GDP and
confirmed COVID-19 cases, with data points color-coded based on
development status.
By examining how economic factors intersect with pandemic
outcomes, stakeholders can better understand the socioeconomic
dimensions of the crisis.
This analysis aids in identifying disparities in vulnerability and
resilience across different economic strata, guiding targeted
interventions for pandemic response and recovery.
Comparing Average GDP
Across WHO Regions
Explanation:
Visualization: Presents a bar plot of average GDP by WHO region.
Data Aggregation: Computes the mean GDP for each WHO region for
comparison.
Story:
This bar plot showcases the average GDP across different WHO
regions, providing insights into regional economic disparities.
By aggregating GDP data, the plot highlights variations in economic
development among WHO regions.
Stakeholders can use this analysis to prioritize resource allocation
and development initiatives, aiming to address economic
inequalities and promote sustainable growth globally.
Comparing Average Life
Expectancy Across WHO
Regions
Explanation:
Visualization: Displays a bar plot of average life expectancy by WHO region.
Statistical Summary: Calculates the mean life expectancy for each WHO region for
comparison.
Story:
This bar plot illustrates the average life expectancy across various WHO regions,
shedding light on regional health disparities.
By analyzing average life expectancy data, stakeholders can identify regions with
lower life expectancies and prioritize health interventions accordingly.
This analysis aids in understanding global health outcomes and guiding efforts to
improve population health and well-being across different regions.
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Work Management
Work Completed:
Data Acquisition and Cleaning:
Description: Obtained COVID-19, global population, and economic datasets. Cleaned data to ensure
consistency and accuracy.
Responsibility: All team members.
Contributions: 100% by each member.
Initial Visualization Using D3.js:
Description: Created a pie chart of COVID-19 cases by WHO region using D3.js.
Responsibility: Siddhartha Alwala, Shiva Sai Kumar Tummala.
Contributions: 50% each.
Stacked Bar Chart of COVID-19 Cases:
Description: Developed a stacked bar chart showing COVID-19 cases by country using D3.js.
Responsibility: Vishnu Priya Palagiri, Bhargavi Kondaveeti.
Contributions: 50% each.
Cleaning Datasets Using Python and Visualization with Matplotlib:
Description: Cleaned datasets using Python. Visualized global health data with Matplotlib.
Responsibility: All team members.
Contributions: 100% by each member.
Exploring Global Population Dynamics:
Description: Analyzed population distributions and trends.
Responsibility: Siddhartha Alwala, Shiva Sai Kumar Tummala.
Contributions: 50% each.
Exploring Global Economic Patterns:
Description: Examined GDP distributions and correlations with COVID-19 cases.
Responsibility: Vishnu Priya Palagiri, Bhargavi Kondaveeti.
Contributions: 50% each.
Understanding Numeric Relationships:
Description: Investigated correlations between variables using a heatmap.
Responsibility: All team members.
Contributions: 100% by each member.
Analyzing COVID-19 Trends:
Description: Generated summary statistics and time series plots for COVID-19
data.
Responsibility: All team members.
Contributions: 100% by each member.
Visualizing COVID-19 Metrics on a World Map:
Description: Plotted COVID-19 metrics on a world map.
Responsibility: Siddhartha Alwala, Shiva Sai Kumar Tummala.
Contributions: 50% each.
Exploring Relationships Between Life Expectancy and COVID-19:
Description: Investigated the relationship between life expectancy and COVID-19
cases.
Responsibility: Vishnu Priya Palagiri, Bhargavi Kondaveeti.
Contributions: 50% each.
Overall Contributions:
Siddhartha Alwala -25% , Shiva Sai Kumar Tummala -25% , Vishnu Priya Palagiri -25% ,
Bhargavi Kondaveeti -25%
References:
1. World Health Organization. (2022). COVID-19 Dashboard. [Online]. Available:
https://www.who.int/emergencies/disease-outbreak-news/item/2020-
DON233. [Accessed: April 16, 2024].
2. D3.js Documentation: https://d3js.org/
3. Matplotlib Documentation: https://matplotlib.org/stable/contents.html
4. Pandas Documentation: https://pandas.pydata.org/docs/
Links:
https://vizhub.com/bhargavikondaveeti/c273a818d4d04170b1a620d29f5a6bbe
https://vizhub.com/bhargavikondaveeti/c273a818d4d04170b1a620d29f5a6bbe?mo
de=embed
https://gist.githubusercontent.com/bhargavikondaveeti/209440046523e6eec9b6d
4efc334f60f/raw/2d1ed33decd151c0aa10584d6e570f8175c5f0b0/project_covid_data.c
sv
https://gist.githubusercontent.com/bhargavikondaveeti/44c211fa68660eaa099633
f5a716129b/raw/030d27261f13125b7c81aed94f2ae6c0f6ab3dd2/life_expectancy_verm
a.csv
Project Final Video.mp4
https://siddharthaalwala.github.io/CSCE_5320_Final_project/